Optimization and High Level Production of Recombinant Synthetic Streptokinase in E. coli Using Response Surface Methodology

Streptokinase (SK) is an extracellular protein comprising 414 amino acids with considerable clinical importance as a commonly used thrombolytic agent. Due to its wide spread application and clinical importance designing more efficient SK production platforms worth investigation. In this regard, a synthetic SK gene was optimized and cloned in to pET21b plasmid for periplasmic expression. Response surface methodology was used to design a total of 20 experiments for optimization of IPTG concentration, post-induction period, and cell density of induction (OD600). The optimum levels of the selected parameters were successfully determined to be 0.28 mM for IPTG concentration, 9.889 H for post induction period, and 3.40768 for cell density (OD600). These settings result in 4.14fold increase in SK production rate of optimum expression conditions (7663 IU/mL) in comparison to the primary expression conditions (1853 IU/mL). Achieving higher yields of SK production in shake flask could lead to more cost effective industrial production of this drug which is the ultimate aim of SK production studies.


Introduction
Streptokinase (SK) (EC 3.4.99.22) is innately secreted by many pathogenic strains hemolytic streptococci as a single chain (47 kDa) extracellular protein, comprising 414 amino acids by a 26 amino acid signal peptide. This protein is of considerable clinical importance as a commonly used thrombolytic agent (1). Reperfusion with thrombolytic agents is one of the possible approaches established to treat acute ischemic stroke. Although symptomatic cerebral bleeding and reperfusion-associated injury are among the risks associated with thrombolytic therapy in patients with acute myocardial infarction, the success achieved with this approach as renewed the interest in reperfusion with thrombolytic agents like SK (2). Despite the fact that originally SK is not a protease, it gives rise to plasmin upon complexation with plasminogen, which in turn dissolves the fibrin network of blood clots and solubilizes degradation products (3,4). The low cost of producing SK, along with its fewer secondary effects makes it an attractive agent in comparison to tissue plasminogen activator as the major enzyme responsible for clot breakdown. Taken together, it is apparent that designing more efficient SK production platforms is worth investigation.
Small yields of SK production in native hosts, pathogenicity of the native host, and association of other antigenic molecules like Streptodornase are the challenges which lie ahead of implementing high yield production (5). However, heterologous production of SK in Escherichia coli (E. coli) as the most widely used prokaryotic system for the synthesis of heterologous proteins would circumvent these limitations (6, 7). Rapid growth, cheap cultivation, well studied genetic, and physiological background, various cloning vectors and high expression level in large scale cultures are the main factors which have led to the development of E. coli into a highly successful system for the production of various heterologous proteins (8-10). Since heterologous expression of proteins by E. coli is dependent on the composition of the culture medium (the expression conditions, expression vector design, promoter strength, expression host strain, and codon usage) the optimization of these factors is highly recommended before industrial scale production (11). Response surface methodology (RSM) is a compelling method to design the minimum number of experiments to find the optimum conditions for heterologous expression.
In this study, the SK gene was designed and optimized for prokaryotic expression and cloned into pET21b expression vector. Thereafter, the RMS approach was employed to optimize the expression conditions. A total number of 20 experiments were carried out to reach the optimum points for three significant cultivation conditions: isopropyl-β-D thiogalactopyranoside (IPTG) concentration, post-induction period and cell density of induction (OD 600 ). The optimization process resulted in high preplasmic SK expression, which is of great significance for industrial production.

Chemicals, bacterial strains, vectors and DNA techniques
Bacterial growth was done in Luria and Bertani (LB) medium (Merck, Germany) at 37 °C with shaking at 200 rpm. Restriction enzymes were provided from Takara (Shiga, Japan). Human plasminogen and S-2251 were purchased from Sigma Chemicals Company (USA). All chemicals used in laboratory were analytical grade. E. coli DH5a (Stratagene, USA) (f-gyr A96 Nalr, recA1 relA1 Thi-1 hsdR17 r-k m+k) was used as the primary host for the transformation, BL21 (DE3) pLysS (f -ompt hsdB, rB¯ mB¯, dcm gal, DE3, pLYsS cmr) was used to express the SK recombinant protein.
The pET-21b (Novagen) was utilized for overexpression of recombinant protein.

Codon optimization, cloning and construction of the Streptokinase expression vector
The codon preferences of E. coli and Streptococcus pyogenes (S. pyogenes) are significantly different. The DNA coding sequence (1323 bp) of Streptokinase from S. pyogenes (GenBank accession no. M19347.1) was taken. Codon optimization was carried out to the codon preference of E. coli genes using NCBIrelated database at (http://www.kazusa.or.jp/ codon). The optimized SK gene with PelB as a signal sequence, flanked by NdeI and BamHI restriction sites, was synthesized by Shinegene company (China) (GenBank accession no. KT156726.1) then cloned into pUC57 plasmid called pUC-SK. The synthetic PelB-SK was inserted into the pET-21b between the NdeI and BamHI restriction sites. The ligated products were transformed into E. coli BL21 (DE3) plysS competent cells by the CaCl 2 method. Screening was done on LB + 100 mg/mL ampicillin and correctness of cloning was confirmed by colony-PCR and sequencing.

Induction and Expression of Recombinant Streptokinase in E. coli DE3
A single colony of E. coli BL21 (DE3) harboring recombinant plasmid (pET21b-PelB-SK) was inoculated in 5 ml of LB medium culture at 37 °C with shaking at 200 rpm overnight and supplemented with 100 mg/mL of ampicillin. To induce the expression of SK protein, 0.5 mM (IPTG) was used when the cell density was reached to OD 600 = 0.8 in the shake flask experiments. The expression was done for 12 h.

Isolation of peripelasmic SK, renatuartion of inclusion bodies and molecular weight analysis
The cells in culture were harvested by centrifugation at 4500×g for 10 min at 4 °C and periplasmic expression were obtained by exposing the cell pellet with an equal volume of STE buffer (1 mg lysozyme/mL, 20% (w/v) sucrose, 30 mM Tris/HCl (pH 8.1), 1 mM EDTA) on ice for 10 min. Cell debris was separated by centrifugation according to our previous paper (12).
The separated periplasmic SK was as insoluble inclusion bodies, so it needed a renaturation to reach an active form of recombinant enzyme. First the inclusion pellets were solubilized in 8 M urea buffer at pH 8. The mixture was incubated at 25 °C for 1 h before the insoluble parts were removed by centrifugation. The solution was then diluted with phosphate buffer (pH 10.7) for SK renaturation. The solution was dialysis against the buffer [20 mM Tris/HCl pH 8.0, 50 mM NaCl, 1 mM EDTA] at 4 °C overnight.
Recombinant SK protein was analyzed by 15% sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). The protein band of recombinant SK was visualized by Coomassie Blue staining. The amount of total protein was determined by the Bradford method, using bovine serum albumin (BSA) as a standard.

Overexpression of recombinant Streptokinase using response surface methodology
Central Composite Design (CCD) and Response Surface Methodology (RSM) were used for optimization of recombinant Streptokinase expression level in E. coli. Minitab 16 software (Minitab Inc., USA) was taken to design the experiment and optimization of three significant cultivation conditions: IPTG concentration, post-induction period and cell density of induction (OD 600 ). Twenty experiments contain six replicated center points were designed and carried out. The analyses of experimental data were carried out statistically by regression method: Y = β0 + ∑βi xi + ∑βiixii 2 + ∑βijxi xj + ε Where Y is the predicted streptokinase mRNA percentage, β0 is a constant coefficient, βi the linear coefficient, βii the quadratic coefficient and βij the cross-product coefficient. Xi and Xj are input independent variable levels, while ε is the residual error. Design Expert software 7.0.0 was employed for Data analysis of experimental design and surface response methodology. The transcriptional level of recombinant Streptokinase was measured by Real Time PCR method in different conditions. Recombinant E. coli host cells transformed with pET21b-PelB-SK vector without induction were used as negative control.

Quantitative analyses by real-time PCR and ΔΔCt method
Total RNA from E. coli BL 21 cells contain recombinant SK were isolated using Trizol reagent (Life Technologies, USA) following the standard protocol. First, the cells were homogenized in 1 mL Trizol solution and 200 μL chloroform was added to samples and mixed completely for 3 min. Then, the mix was centrifuged at 12,000×g for 15 min at 4 °C. The upper aqueous phase was transferred carefully into new tube without disturbing the interphase and equal volume of isopropanol was added in the tube. The mixtures were thoroughly resuspended and incubated in -20 °C for 30 min and centrifuged at 12,000 ×g for 10 min at 4 °C. The precipitated RNA pellets were washed with 1 mL ethanol (75%, v/v). RNA pellets were recovered after centrifugation at 12,000 ×g for 5 min at 4 °C. RNA samples were allowed to air-dry for 2-3 min and then resuspended in 30 mL diethyl pyrocarbonate-treated water (Life Technologies, USA). The extracted RNA was quantified by measuring absorbance at 260/280 nm by Nanodrop and the quality of RNA purified was checked by gel electrophoresis. cDNA synthesis was carried out with Thermo Scientific cDNA synthesis kit. Reverse transcription was followed using 50 mg total RNA (maximally in 20 mL) and 1 mL random hexamer primers. The volume of the assay mixture was adjusted to 12 mL with RNase-free water, and then the mixture was incubated for 5 min at 70 °C, followed by incubation for 10 min at room temperature to allow the primers to anneal with the RNA. For analysis of streptokinase expression level, Real Time PCR was performed by using Power SYBR® Green PCR Master Mix (life technology), according to manufacturer's instructions (Applied Biosystems, USA). The kit has a Hot-start Taq DNA polymerase. All samples were analyzed in duplicate and the average value is reported. For determination of the mRNA level, 16S rRNA was used as internal control gene. The primers used for this study were designed using Web-based Oligo7 Primer Analysis Software. The primers for streptokinase were: sense 5-CATAAACTGGAAAAAGCCGATCTG -3 and antisense 5- The primers for 16S were: sense 5-CTACGGGAGGCAGCAGTGG -3 and antisense 5-TATTACCGCGGCTGCTGGC -3. The StepOne™ Real-Time PCR Systems (ABI) was used to detection relative quantification. The amplification reactions were done under following conditions: 10 min at 95 °C, followed by 45 cycles at 95 °C for 15 sec, 60 °C for 1 min. Melting curve program was set to 60-95 °C with a heating rate of 0.1 °C per second and a continuous fluorescence measurement. In order to identify the specificity of amplification products a dissociation curve was plotted ( Figure  1). The 2 -ΔΔCt method was used to analyze the relative changes in the level of gene expression (13).
Determination of Streptokinase activity SK activity was assayed using by chromogenic substrate method that was an endpoint method. The Streptokinase transformed plasminogen to plasmin in solution, in the existence of chromogenic substrate S-2251 (H-D-valyl-Lleucyl-L-lysine-p-nitroanilide dihydrochloride; (Sigma, USA), in the absence of fibrin.
Substrate solution included a mixture of 1 mL of 0.5 M Tris-HCl pH 7.4, 1 mL of 3 mM S-2251 and 5 μL of 10% Tween 20. This solution was kept at 37 °C and immediately before use, 45 μL of human-plasminogen solution (1 mg/mL) was also added to substrate solution. Streptokinase solution tested at different concentrations for the dose-response curve. Streptokinase diluted in 10 mM of Tris-HCl (pH 7.4) at 37 °C, 0.1 mM NaCl and 1 mg/mL albumin to reach 4.0, 2.0, 1.0, 0.5 IU/mL concentrations and maintained at 37 °C in a microtiter plate. In the test samples, the reaction was performed after addition of 60 μL of Streptokinase solution to 40 μL of substrate solution. The absorbance of the wells was measured at 405 nm for 20 min before an endpoint OD was taken immediately.

Gene optimization, cloning and expression of optimized SK gene in E. coli DE3
Different cells use the same codons with different codon preferences. The coding sequence of SK gene from S. pyogenes (GenBank Acc. No. M19347.1) was synthesized with codon optimi zation of E. coli. The optimized synthetic gene (GenBank KT156726.1) and wild-type sequence (GenBank M19347.1) share 78% identity. During SK gene optimization, 275 nucleotides were changed, which led to 235 amino acid codon optimization (56.5%) and deletion of 23 rare codons ( Table 1). The GC content increased from 40% to 45%, closer to the average GC content of other highly expressed genes in E. coli Kazusa. Moreover, no cryptic splicing sites, internal chi sites and ribosomal binding sites, negative CpG islands, repeat sequences, restriction sites that may interfere with cloning, RNA instability motif (ARE), and mRNA secondary structure were detected to be optimized.
The SK gene fragment was digested out of the pUC-SK cloning vector and ligated into the pET21b expression vector. The SK gene fragment (~1300 bp) containing streptokinase gene and PelB signal sequence was inserted into NdeI/ BamHI sites downstream of the T7 promoter region. The E. coli BL21 DE3 was transformed by recombinant pET21b-SK plasmid. As shown in Figure 2, screening of transformants by colony-PCR technique indicated that the SK gene was successfully cloned. The accuracy of the cloning was confirmed by sequencing results.
Recombinant SK gene was expressed using IPTG as the inducer. The expressed protein was then analyzed by SDS-PAGE method. SDS-PAGE analysis of total lysate of induced E. coli BL21 (DE3) demonstrated a protein band in the desired range, with a MW of ~47 kDa (Figure 3). The SDS-PAGE results indicated that 13 screening of transformants by colony-PCR technique indicated that the SK gene wa cloned. The accuracy of the cloning was confirmed by sequencing results.

Experimental design and modeling for SK optimization by RSM
Following the confirmation of SK expression under the basal conditions, the optimum levels of significant cultivation conditions including IPTG concentration, post-induction period, and cell density of induction (OD 600 ) should be achieved. In this regard, the experimental range of each three variables was produced in five levels (-α, -1, 0, +1, +α) ( Table 2). Thereafter, 20 experiments including six replications of the central points were designed to optimize the selected parameters (Table 3). Moreover, this table provides the assessed (by real time method) percentage of expressed SK mRNA under the Actual column, while the predicted amounts are represented under the Predicted column. The provided predicted levels of SK activity (on mRNA expression level) as the function of   Table 4 contains the data pertaining to the confirmation of statistical significance of the above equation by an F-test and the analysis of variance (ANOVA) for response surface quadratic model. The Model F-value of 25.31 with a very low probability value [(Prob > F) < 0.0001] confirms that the model is significant and there was only a 0.01% chance that the Model F-Value could occur as a consequence of the noise. Values of "Prob > F" less than 0.0500 indicate model terms are significant. In this case B, C, A2, B2, and C2 are significant model terms. Values greater than 0.1000 indicate the model terms are not significant. If there are many insignificant model terms (not counting those required to support hierarchy), model reduction may improve the model. Moreover, the diagnostic plots were used for estimating the adequacy of the regression model. The R 2 coefficient was determined to check the fitting of the model. The closer R 2 values to 1 show the stronger model and the better prediction of response. The actual values are the result obtained for a specific run and the predicted values are obtained from the independent variables in the CCD model. The R 2 value was calculated to be 0.96, therefore our results reveals that the regression model for SK overexpression fits to the experimental values ( Figure 4). The effect of each factor on the SK activity (on the mRNA expression level) and their optimum amount are depicted by 3D surface plot ( Figure 5). The optimum levels of the selected parameters were determined to be 0.28 mM for IPTG Concentration, 9.889 H for post induction period, and 3.40768 for cell density (OD 600 ). Predicted maximum SK activity (on the mRNA expression level) was 93.64 % which is in concordance to the 94.27% SK activity (on the mRNA expression level) obtained during the experiments (Figure 6).

Refolding and comparison of SK activity before and after optimization
Ultimately, the total activity assay of the SK cultured in the optimum amounts of IPTG concentration, post induction period, and cell density indicates a significant increase compared to the SK activity under the primary expression conditions (0.5 mM for IPTG Concentration, 12 H for post induction period and 0.8 for cell     density (OD 600 )). The achieved activity results indicate a 4.14fold increase in SK activity of optimum expression conditions (7663 IU/mL) in comparison to the primary expression conditions (1853 IU/mL).

Discussion
Achieving an optimum expression condition for a recombinant protein could be affected by various factors (14,15). It should be taken into consideration that the expression of a plasmid in the host cell exerts a metabolic burden which could end with reduced specific growth rate and biomass content and plasmid instability (16). Moreover, the onset of glucose overflow metabolism and acetate formation as two detrimental factors for recombinant protein production determines the upper limit of the specific growth rate (17)(18)(19). These parameters accentuate the necessity of obtaining an optimum condition for overexpression of recombinant proteins. In this regard, we have employed the RSM, which is proved to be one of the most accurate multivariate analysis methods, to determine the optimum culture conditions for SK expression by simultaneously changing of IPTG concentration, post-induction period, and cell density of induction (OD 600 ).
Different bacterial species do not share the same codon preference for their translation processes. Thus, biased codon usage is one of the major factors affecting the heterologous gene expression and should be dealt with properly. Decreased mRNA stability and transla tion rate are the consequences of existing rare codons and high G+C contents could lead to reduced translational yields or even failed expression (20). Codon optimization as a genetic technique in which the existing rare codons of a species are replaced with a set of more favorable host codons throughout the whole gene could enhance the recombinant protein expression by 2-3 folds (20-23). Moreover, employing gene optimization algorisms could benefit simultaneous optimization of cryptic splicing sites, internal chi sites, and ribosomal binding sites, negative CpG islands, repeat sequences, restriction sites that may interfere with cloning, RNA instability motif (ARE) and mRNA secondary structure, the latter of which affects the translation efficiency (24). This technique was employed to resolve the varying codon usage preference between the S. pyogenes and E. coli to achieve optimum

Refolding and comparison of SK activity before and after optimization
Ultimately, the total activity assay of the SK cultured in the optimum amounts of IPTG concentration, post induction period, and cell density indicates a significant increase compared t the SK activity under the primary expression conditions (0.5 mM for IPTG Concentration, 12 H for post induction period and 0.8 for cell density (OD600)). The achieved activity results indicate 4.14fold increase in SK activity of optimum expression conditions (7663 IU/mL) in compariso to the primary expression conditions (1853 IU/mL). expression of the SK gene in a host's cellular system. Chemical synthesis of the target gene is rationally the most appropriate way to get the codon-optimized gene. Introduction of the pelB signal peptide to the SK gene through a synthesis process would guide the protein into periplasmic space. Periplasmic expression could additionally provide the SK with separation from other impurities in cytoplasmic space, an oxidizing medium for the formation of disulfide bonds, and keeping its activity and biological structure (25).
It has been reported that IPTG concentration, post induction period and cell density are among the most important production conditions to be optimized for high yield recombinant protein expression (26). The conventional approach to optimizing production conditions is to vary one parameter at a time whilst keeping the others constant. However, due to the large number of required experiments this approach is not practical when numerous parameters are taken into account. Aside from the cumbersome nature of this approach, it could lead to misinterpretation of results when the interaction between different variables is present (27). RSM is a commonly used alternative method to overcome the aforementioned snags. It is a mathematic and statistical tool used to design optimization experiments, to build models, and to study the interactions within various bioprocess parameters, whilst also running the smallest number of experiments possible (28). Using 5 levels of each variable in the experimental design seems to be more efficient than using 3 levels to arrive at the best results. In addition to determining the optimum conditions, the optimization in 5 levels reveal the accuracy of the selected range of variables. The point prediction tool of the software was used to determine the optimum value of IPTG concentration (0.28 mM), post induction period (9.889 H), and cell density (3.40768). Furthermore, to have a better grasp on the three factors of optimal SK production, the models were presented as 3-D response surfaces. The obtained results revealed that our RSM approach successfully optimized SK production up to a 4.14fold increase. To the best of our knowledge, the obtained yield of periplasmic SK production within the shake flasks is the highest yield reported so far (29-32).
Although the E. coli expression system is a highly characterized one and various expression settings have been developed based on this host, the beginning of the log phase of the bacterial growth is thought to be the best stage for protein expression. However, our observations contradict this, as we have demonstrated that the highest amount of recombinant SK production was reached at midway, and close to the end of the log phase, which occurred after induction. Studies published by Galloway et al. Chae et al. and Samarin et al. have reported high yield recombinant protein expression close to the end point of the log phase (33-35). They have reported escalation of soluble expression of the target protein and their diminished proteolysis in the cytoplasm. These studies, along with the study conducted by J. Ou et al. were in concordance with our results (36). It should be noted that this finding is applicable to the heat shock induced promoters as well as Lac based promoters (36). It has generally been suggested that the E. coli cells at their midway of log phase or close to the end of log phase behave like the cells at the beginning of their log phase (with high physiological activity) regarding the transcription, translation, and protein folding processes. Moreover, the secretion of the expressed recombinant protein seems to be alleviated close to the end of the log phase. These phenomena could be rooted in the fact that the metabolic flow of the bacteria is diverted towards producing the target protein. This could be the rationale behind the growth halt observed an hour after induction at the end of this phase. The concentration of the employed IPTG is the other factor that should be considered for optimum protein expression. The high cost and potential cytotoxicity of IPTG makes it an imperative optimization target. IPTG needs further development due to its ability for significant reduction in growth rate and production of bacterial proteases degrading heterologous proteins at high concentrations. Although employed, IPTG concentration for induction ranges widely from 0.005 to 5 mM, our study finds out the optimum amount of IPTG to be as low as 0.28 mM (37). In this regard, Larentis et al. have already reported that employing IPTG at a concentration 10 fold lower than usually used could result in optimum expression of recombinant (38).
In conclusion, the employed RSM successfully determined the optimized conditions for recombinant SK production. Achieving higher yields of SK production in shake flasks could lead to more cost effective industrial production of this drug which is the ultimate aim of SK production studies. Although our investigation have reached a 4.14fold increase in the production rate, it seems that optimization of other influential parameters of protein expression would bring about higher yields of SK production.